Lua標準ライブラリ/文字列ライブラリ
文字列ライブラリ Luaの文字列のインデックス付けは、最初の文字が1の位置である (C、PHP、JavaScriptのように0ではない)。 文字列の末尾から逆方向にマイナス値で指定することもできる。 つまり、最後の文字は -1 の位置で示される。 『待った！』 この文字列ライブラリは1バイト専用と言ってもよく、 2バイト以上の文字に対応していません。 (Wikia)2バイト以上の文字を含む場合は、ScribuntoのUstlingライブラリをご利用ください。 string.byte string.byte(si[, j]) 文字列sのi番目からj番目までの文字コードの値を返します。 省略時iは1、jはiと同じになります。 Identical to mw.ustring.byte(). string.char string.char(...) 引数の数値と等しい文字コードの文字を連結した文字列を返します See mw.ustring.char() for a similar function that uses Unicode codepoints rather than byte values. string.find string.find(s, pinit[, plain]) 文字列s内でのパターンpの最初のマッチの開始位置と終了位置を返します。 マッチしなかった場合はnilを返します。init には検索開始位置を指定します。 plain が true の場合パターンマッチングは行われず、単純な部分文字列検索になります。 パターン内にキャプチャが指定されていた場合、その内容が3番目以降の戻り値として返されます。 See mw.ustring.find() for a similar function extended as described in Ustring patterns and where the init offset is in characters rather than bytes. string.format string.format(fmt, ...) 任意のフォーマットで記述します。 The format string uses a limited subset of the printf format specifiers: * Recognized flags are '-', '+', ' ', '#', and '0'. * Integer field widths up to 99 are supported. '*' is not supported. * Integer precisions up to 99 are supported. '*' is not supported. * Length modifiers are not supported. * Recognized conversion specifiers are 'c', 'd', 'i', 'o', 'u', 'x', 'X', 'e', 'E', 'f', 'g', 'G', 's', '%', and the non-standard 'q'. * Positional specifiers (e.g. "%2$s") are not supported. The conversion specifier 'q' is like 's', but formats the string in a form suitable to be safely read back by the Lua interpreter: the string is written between double quotes, and all double quotes, newlines, embedded zeros, and backslashes in the string are correctly escaped when written. Conversion between strings and numbers is performed as specified in Data types; other types are not automatically converted to strings. Strings containing NUL characters (byte value 0) are not properly handled. Identical to mw.ustring.format(). string.gmatch string.gmatch(s, p) 汎用 for 文で使用します。文字列s内の正規表現pにマッチする部分文字列を返すイテレータ関数を返します。If pattern specifies no captures, then the whole match is produced in each call. For this function, a '^' at the start of a pattern is not magic, as this would prevent the iteration. It is treated as a literal character. See mw.ustring.gmatch() for a similar function for which the pattern is extended as described in Ustring patterns. string.gsub string.gsub(s, p, repln) 文字列s内のパターンpにマッチする部分をreplで置換します。 replには文字列、テーブル、関数のいずれかを指定します。nには置換が行われる回数を指定します。 返値は実際に置換が行われた回数が入ります。 If repl is a string, then its value is used for replacement. The character % works as an escape character: any sequence in repl of the form %''n'', with n'' between 1 and 9, stands for the value of the ''n-th captured substring. The sequence %0 stands for the whole match, and the sequence %% stands for a single %. If repl is a table, then the table is queried for every match, using the first capture as the key; if the pattern specifies no captures, then the whole match is used as the key. If repl is a function, then this function is called every time a match occurs, with all captured substrings passed as arguments, in order; if the pattern specifies no captures, then the whole match is passed as a sole argument. If the value returned by the table query or by the function call is a string or a number, then it is used as the replacement string; otherwise, if it is false or nil, then there is no replacement (that is, the original match is kept in the string). See mw.ustring.gsub() for a similar function in which the pattern is extended as described in Ustring patterns. string.len string.len(s) string.len(str) 文字列の長さを取得します。#sでも代用できます Returns the length of the string, in bytes. Is not confused by ASCII NUL characters. See mw.ustring.len() for a similar function using Unicode codepoints rather than bytes. string.lower string.lower(s) 文字列s中のアルファベットを小文字にした文字列を返します アルファベットと同じ値の2バイト文字も変換されるので注意 See mw.ustring.lower() for a similar function in which all characters with uppercase to lowercase definitions in Unicode are converted. string.match string.match(s, pinit) 文字列s内の正規表現pにマッチする部分文字列を返します。 パターン内にキャプチャがあればそれを返します。init(省略時は1)は検索開始位置を指定します。 See mw.ustring.match() for a similar function in which the pattern is extended as described in Ustring patterns and the init offset is in characters rather than bytes. string.rep string.rep(s, n) 文字列sをn回繰り返した文字列を返します。Identical to mw.ustring.rep(). string.reverse string.reverse(s) 文字列sを逆にした文字列を返します 2バイト文字は正しく処理されません string.sub string.sub(s, ij) 文字列sのi番目からj番目の文字までの部分文字列を返します。 j省略時は文字列の末尾になります。 In particular, the call string.sub(s,1,j) returns a prefix of s with length j, and string.sub(s, -i) returns a suffix of s with length i. See mw.ustring.sub() for a similar function in which the offsets are characters rather than bytes. string.upper string.upper(s) 文字列s中のアルファベットを大文字にした文字列を返します アルファベットと同じ値の2バイト文字も変換されるので注意 See mw.ustring.upper() for a similar function in which all characters with uppercase to lowercase definitions in Unicode are converted. 正規表現 Note that Lua's patterns are similar to regular expressions, but are not identical. In particular, note the following differences from regular expressions and PCRE: * Perlなどの「\」ではなく、「%」を使う。 * Dot (.) always matches all characters, including newlines. * No case-insensitive mode. * No alternation (the | operator). * Quantifiers (*, +, ?, and -) may only be applied to individual characters or character classes, not to capture groups. * The only non-greedy quantifier is -, which is equivalent to PCRE's *? quantifier. * No generalized finite quantifier (e.g. the {n'',''m} quantifier in PCRE). * The only zero-width assertions are ^, $, and the %fset "frontier" pattern; assertions such as PCRE's \b or (?=···) are not present. * Patterns themselves do not recognize character escapes such as '\''ddd'. However, since patterns are strings these sort of escapes may be used in the string literals used to create the pattern-string. Also note that a pattern cannot contain embedded zero bytes (ASCII NUL, "\0"). Use %z instead. Also see Ustring patterns for a similar pattern-matching scheme using Unicode characters. Character class A ''character class is used to represent a set of characters. The following combinations are allowed in describing a character class: * x'': (where x'' is not one of the magic characters ^$()%.[]*+-?) represents the character ''x itself. * '''.: (a dot) represents all characters. * %a: represents all ASCII letters. * %c: represents all ASCII control characters. * %d: represents all digits. * %l: represents all ASCII lowercase letters. * %p: represents all punctuation characters. * %s: represents all ASCII space characters. * %u: represents all ASCII uppercase letters. * %w: represents all ASCII alphanumeric characters. * %x: represents all hexadecimal digits. * %z: represents ASCII NUL, the zero byte. * %A: All characters not in %a. * %C: All characters not in %c. * %D: All characters not in %d. * %L: All characters not in %l. * %P: All characters not in %p. * %S: All characters not in %s. * %U: All characters not in %u. * %W: All characters not in %w. * %X: All characters not in %x. * %Z: All characters not in %z. * '''%''x: (where x'' is any non-alphanumeric character) represents the character ''x. This is the standard way to escape the magic characters. Any punctuation character (even the non magic) can be preceded by a '%' when used to represent itself in a pattern. * [set]: represents the class which is the union of all characters in set. A range of characters can be specified by separating the end characters of the range with a '-'. All classes %''x'' described above can also be used as components in set. All other characters in set represent themselves. For example, %w_ (or _%w) represents all alphanumeric characters plus the underscore, 0-7 represents the octal digits, and 0-7%l%- represents the octal digits plus the lowercase letters plus the '-' character. The interaction between ranges and classes is not defined. Therefore, patterns like %a-z or a-%% have no meaning. * ^''set'': represents the complement of set, where set is interpreted as above. Pattern items A pattern item can be * a single character class, which matches any single character in the class; * a single character class followed by '*', which matches 0 or more repetitions of characters in the class. These repetition items will always match the longest possible sequence; * a single character class followed by '+', which matches 1 or more repetitions of characters in the class. These repetition items will always match the longest possible sequence; * a single character class followed by '-', which also matches 0 or more repetitions of characters in the class. Unlike '*', these repetition items will always match the shortest possible sequence; * a single character class followed by '?', which matches 0 or 1 occurrence of a character in the class; * %''n'', for n'' between 1 and 9; such item matches a substring equal to the ''n-th captured string (see below); * %b''xy'', where x'' and ''y are two distinct characters; such item matches strings that start with x'', end with ''y, and where the x'' and ''y are balanced. This means that, if one reads the string from left to right, counting +1 for an x'' and -1 for a ''y, the ending y'' is the first ''y where the count reaches 0. For instance, the item %b() matches expressions with balanced parentheses. * %f[set], a frontier pattern; such item matches an empty string at any position such that the next character belongs to set and the previous character does not belong to set. The set set is interpreted as previously described. The beginning and the end of the subject are handled as if they were the character '\0'. Note that frontier patterns were present but undocumented in Lua 5.1, and officially added to Lua in 5.2. The implementation in Lua 5.2.1 is unchanged from that in 5.1.0. Pattern A pattern is a sequence of pattern items. A '^' at the beginning of a pattern anchors the match at the beginning of the subject string. A '$' at the end of a pattern anchors the match at the end of the subject string. At other positions, '^' and '$' have no special meaning and represent themselves. Captures A pattern can contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the substrings of the subject string that match captures are stored ("captured") for future use. Captures are numbered according to their left parentheses. For instance, in the pattern (a*(.)%w(%s*)), the part of the string matching a*(.)%w(%s*) is stored as the first capture (and therefore has number 1); the character matching . is captured with number 2, and the part matching %s* has number 3. As a special case, the empty capture () captures the current string position (a number). For instance, if we apply the pattern "()aa()" on the string "flaaap", there will be two captures: 3 and 5.